Now let's play with databases. MySQL and SQLite databases are accessed in very similar fashion in python
Let's install the different packages
pip install MySQL-python
pip install sqlalchemy
pip install ipython-sql
pip install dbtools
SQLite and MySQL have very similar features / language, but also some subtiles differences. The main difference if that MySQL requires a server running to host the data, while SQLite can be stored in a file locally. Some pros and cons:
SQLite:
MySQL:
Start with sqlite, you can then test out the mysql by setting
sql_type = 'mysql'
In [1]:
sql_type = 'mysql'
In [2]:
user_name = "pire"
In [3]:
tbname = "py4we_" + user_name
Importing the logit magic for benchmarking the different methods
In [4]:
%load_ext logit
Let's load the data previously saved in json format
In [5]:
import json
with open('tmp.json','r') as f:
json_dic = json.load(f)
json_dic.keys()
Out[5]:
In [6]:
#first replace the dates by datetimes formats
from datetime import datetime
#fmt = "%Y%m%d %H:%M:%S"
#json_dic['dates'] = [datetime.strptime(r, fmt) for r in json_dic['dates']]
#Then build a numpy array containing all the informations
db_data = array([json_dic[c] for c in ['dates', 'weekdays', 'latitude', 'longitude', 'orientation']]).T
db_data[:10]
Out[6]:
This is the easiest to use mysql, but it only works within ipython notebooks, and it has some limitations.
First you need to load the sql magic.
In [5]:
%load_ext sql
It works both for SQLite and MySQL (and PostgreSQL if you are in that sort of things).
In [8]:
if sql_type == 'sqlite':
## Connect
%sql sqlite:///mysqlite.db
# Print out what are the tables
res = %sql SELECT * FROM sqlite_master WHERE type='table'
elif sql_type == 'mysql':
## Connect
%sql mysql://student:DTU63Course@10.40.20.14/coursedb
## Print out what are the tables
res = %sql SHOW TABLES
res
Out[8]:
Delete the table
In [9]:
%sql DROP TABLE test_sql
Out[9]:
Create the Table
In [10]:
%%sql
CREATE TABLE test_sql
(
dates VARCHAR(20),
weekdays VARCHAR(10),
latitude DOUBLE,
longitude DOUBLE,
orientation DOUBLE
);
Out[10]:
It's possible to insert the data into the database line by line, but it's going to be painfully slow for large datasets.
Notice the way the python variables are added in the SQL query using the ":" in front of it. It's a bit tricky because it will replace the variable name by its value with '' around it. There is unfortunately a small bug when you try to use this to define the table name. It will replace :table_name by 'table_name' which is not a valid SQL query.
In [11]:
N = 20
for date, weekd, latitude, longitude, orientation in db_data[:N]:
res = %sql INSERT INTO test_sql VALUES (:date, :weekd, :latitude, :longitude, :orientation);
%sql SELECT * FROM test_sql
Out[11]:
It's easy to obtain a pandas.DataFrame from the query result
In [12]:
results = %sql SELECT * FROM test_sql
pres = results.DataFrame()
plot(pres.latitude, pres.longitude, '.')
Out[12]:
Using the SQLite3 and MySQLdb modules
In [13]:
### You can select here which method to use: sqlite or mysql
if sql_type == 'sqlite':
import sqlite3
db = sqlite3.connect('mysqlite.db')
elif sql_type == 'mysql':
import MySQLdb
db = MySQLdb.connect(host="10.40.20.14", user="student", passwd="DTU63Course", db="coursedb")
c = db.cursor()
(Drop and) Create the table
In [14]:
try:
c.execute("DROP TABLE "+tbname)
except:
print tbname+' doesn\'t exist yet'
c.execute("""
CREATE TABLE %s (
dates VARCHAR(20),
weekdays VARCHAR(10),
latitude DOUBLE,
longitude DOUBLE,
orientation DOUBLE
)"""%(tbname))
Out[14]:
Insert all the data and read it
In [15]:
%%logit executemany, mysqlite.db, sql_type
# Insert all the data from db_data
if sql_type == 'sqlite':
query_insert = 'INSERT INTO '+tbname+' VALUES (?,?,?,?,?)'
elif sql_type == 'mysql':
query_insert = 'INSERT INTO '+tbname+' VALUES (%s,%s,%s,%s,%s)'
## Notice the executemany instead of execute. It's much faster when inserting data.
c.executemany(query_insert, db_data.tolist())
# Don't forget to use this if you want to save your modification on the database
db.commit()
In [16]:
# Read all the data back
c.execute('SELECT * FROM '+tbname)
#Print the first ten rows
for row in c.fetchall()[:10]:
print row
# Don't forget to close the database
c.close()
Pandas as a SQL interface that can use SQLite3 or MySQLdb connectors
In [17]:
import pandas as pd
import pandas.io.sql as psql
df1 = psql.frame_query('SELECT * FROM '+tbname, db)
# Indicate that the index is of date format
df1.index = pd.DatetimeIndex(df1.dates)
df1.head()
Out[17]:
In [18]:
df1.plot()
Out[18]:
You can also use pandas directly to write to your database
In [19]:
%%logit pandas_write_frame, mysqlite.db, sql_type
if psql.table_exists(tbname, db, sql_type):
psql.execute('DROP TABLE '+tbname, db)
psql.write_frame(df1, tbname, db, flavor=sql_type)
Let's time it using sqlite
And reading again from it
In [20]:
psql.frame_query('SELECT * FROM '+tbname, db).head()
Out[20]:
dbTools is tool to access SQLite datafiles. It is inspired from SQLAlchelmy, but is much simpler to use (but also less powerfull and less stable)
In [21]:
from dbtools import Table
try: Table("mysqlite_dtools.db", tbname).drop()
except: pass
tbl = Table.create("mysqlite_dtools.db", tbname,
[('dates', str),
('weekdays', str),
('latitude', float),
('longitude', float),
('orientation', float)])
tbl.columns
Out[21]:
Once it's created, you can just use it this way. It's very slow, so we reduce it to the first 10,000 entries.
In [22]:
%%logit dbtools_insertlist, mysqlite_dtools.db, sqlite
tbl = Table("mysqlite_dtools.db", tbname)
tbl.insert(db_data[:10000])
The select() function directly returns a panadas dataframe
In [23]:
df2 = tbl.select()
df2.plot()
Out[23]:
Select function can be used as a SELECT query
In [24]:
### This is equivalent to this query:
### SELECT dates, orientation FROM py4we_tb WHERE weekdays="Thursday";
### or df3 = df2[df2.weekdays=='Thursday']
df3 = tbl.select(['orientation', 'latitude'], where='weekdays="Thursday"')
df3.plot()
df3.head()
Out[24]:
It's possible to use a pandas.Dataframe as an input to create a new table but it's even slower:
In [25]:
%%logit dbtools_insertpandas, mysqlite_dtools2.db, sqlite
try: Table("mysqlite_dtools.db", tbname).drop()
except: print tbname, 'doesnt exist'
Table.create("mysqlite_dtools.db", tbname, df2)
Unfortunately the module is still at its infancy, and can't work with sqlite files that haven't been created by it (yet?)
In [26]:
df5 = Table("mysqlite.db", tbname).select()
In [27]:
from logit import plot_stats
df = plot_stats()
You now are going to apply your knowledge to upload the results from the previous exercises to a mysql database. You should upload the results logged in the JSON to this table: py4we_results.
The py4we_results table is organised as followed:
-1
)-1
)
In [36]:
%sql mysql://student:DTU63Course@10.40.20.14/coursedb
%sql DESCRIBE py4we_exercise
Out[36]:
In [48]:
%sql SELECT * FROM py4we_exercise LIMIT 10
Out[48]:
In [1]:
import sqlite3
import pandas as pd
import pandas.io.sql as psql
import MySQLdb
dbmysql = MySQLdb.connect(host="10.40.20.14", user="student", passwd="DTU63Course", db="coursedb")
df = pd.io.sql.frame_query('SELECT * FROM py4we_exercise', dbmysql)
dblite = sqlite3.connect('sqlite_backup.db')
pd.io.sql.write_frame(df, 'py4we_exercise', dblite, flavor='sqlite', if_exists='append')
In [ ]:
import sqlite3
dblite = sqlite3.connect('sqlite_logger.db')
df = pd.io.sql.frame_query('SELECT * FROM logger', dblite)
dbmysql = MySQLdb.connect(host="10.40.20.14", user="student", passwd="DTU63Course", db="coursedb")
pd.io.sql.write_frame(df, 'py4we_exercise', dbmysql, flavor='mysql', if_exists='append')
In [2]:
df.head()
Out[2]:
Plotting the benchmarks
This line is achieving the following manipulation:
In [32]:
df.groupby(['userid', 'testid'])['time'].mean().reset_index() \
.pivot(index='testid', columns='userid', values='time').plot(kind='barh')
Out[32]:
In [46]:
#sql DROP TABLE py4we_exercise
In [ ]:
##sql
CREATE TABLE py4we_exercise (
userid VARCHAR(10),
date VARCHAR(20),
testid VARCHAR(50),
filesize DOUBLE,
time DOUBLE
);